\documentclass[10pt,a4paper]{article}
\usepackage{fullpage}

\usepackage{color}
\usepackage[pdftex]{hyperref}
\usepackage{url}

\begin{document}

\centerline{\sc \large \textbf{The Infinite Factorial Hidden Markov Model}}
\centerline{\textbf{CS886 - Fall 2010}}
\centerline{\textbf{Paper Critique 2}}
\centerline{\textbf{Karim Ali}}
\vspace{2pc}

\section{Overview}
The authors start their paper by discussing the evolution of the Hidden Markov Model (HMM) to the Factorial Hidden Markov Model (FHMM). The main reason
behind this extension is the HMM's limited representational power of latent variables in modeling discrete time series data (e.g. part-of-speech tagging,
dialogue segmentation). To overcome this, the FHMM factors each hidden state into multiple hidden (i.e. latent) state variables. In other words, FHMM formally
defines \textit{M} latent variables as \textit{M} latent Markov chains that evolve according to Markov dynamics at each timestep \textit{t}. Although this
strengthens the representational power of the FHMM for latent variables, it introduces a new free parameter \textit{M}. What the authors are after is
\textbf{learning this parameter from data rather than specifying it beforehand}.

The main contribution of the paper is presenting a non-parametric FHMM, Infinite FHMM (iFHMM), that is based on a new stochastic process for latent feature
representation of time series called the Markov Indian Buffet Process (mIBP). The mIBP can be dervied in three steps:

\begin{enumerate}
 \item Describe a distribution over binary matrices with a finite number of columns.
 \item Integrtate out the parameters of the model by careful selection of the hyperparameters.
 \item Take the limit as the number of features (i.e. columns) goes to $\infty$.
\end{enumerate}

The mIBP provides a matrix \textit{S} which is interpreted as an arbitrarily large set of parallel Markov chains. \textit{S} is then used as the building block
of the iFHMM.

\section{Significance and Originality}
This work is very recent (2009) so it does not have that many citations yet. However, one can understand the significance of the iFHMM in the field of modeling
discrete time series data (e.g. blind source separation) since it can model an infinite (or previously unknown) number of features.

The novelity of the methods presented by the authors stems from relieving the constraint of knowing \textit{M} (number of features) beforehand in the FHMM and
dealing with inifite (or uknown) number of features in the iFHMM. This has various applications in vision, audio processing (e.g. uknown number of speakers),
and natural language processing.

\section{Model Evaluation}
Although the authors presented a very interesting idea (iFHMM), they did not give enough explanation or clarification for some key points in their processes.
My only explanation for that is space limits. Here are some of those key points that I think required more details:

\begin{itemize}
 \item Section 2.1, page 3: how they got to equation 5 from the previous explanation.
 \item Section 2.2, page 3: the derivation for the size of [\textit{S}].
 \item Section 2.3, page 4: I did not get how equation 9 shows that the suggested model is exchangeable in the columns and Markov exchangeable in the rows.
Going back to reference [6] in their paper, I am even more confused. If the columns are exchangeable, then it should not matter where the position of a feature
in the matrix is. However, this is not true as the probability that a customer orders a dish \textit{m} (corresponds to a feature in the matrix) depends on
whether the cutsomer in front of him ordered that dish or not.
 \item Section 3.1, page 5: the authors did not mention what other alternatives other than Laplace(0,1) could be used to IID sample the entries in matrix
\textit{X}. Using Laplace(0,1) was justified because it fits their data better. What about other types of data?
 \item Section 3.1, page 6: the authors did not provide further details about how their likelihood calculations satisfy the two technical conditions for proper
iFHMM likelihoods.
 \item Section 3.2, page 6: no details about how to extend the representation of \textit{S, X, W} were given (first step in the iterative sampling).
\end{itemize}

\section{Experiment Evaluation}
The experimental evaluation done by the authors is very poorly explained and not rigorous enough. My remarks on their experimental evaluation can be summarized
as follows:

\begin{itemize}
 \item Section 3.2, end of page 6: the authors threw some claims without backing them up by empirical evidence, or references to such evidence. For example:
poor performance of naive Gibbs sampler, dynamic programming does not perform better than blocked Gibbs sampler, the bulk of computation is used for estimating
 \textit{X} and \textit{W}.
 \item Much more information was needed about the dataset used for the experiments. For example: which speakers were used in the dataset (there are 34 speakers
in the original dataset); the randomness of the artificial pauses; how long the sentence were; was ther any background noise in the speech.
 \item The authors did not discuss the complexity of their algorithms, or how much CPU time they consume on average.
 \item I was very surprised when I did not find any comparison with other existing blind source (or speech in the authors' case) separation algorithms.
 \item The results obtained using 3 microphones cannot be compared to those obtained using 10 microphones since the authors \textbf{subsampled a noiseless
version of the data} for the former case.
\end{itemize}

\section{Related Work}
The authors did not mention what other algorithms, methods or techniques are used for blind source separation. They only discussed the work on which they based
their processes. It is very interesting to note that 50\% (7 out of 14) of the references had at least one of the authors as a co-author. I think they should
have discussed other people's work rather self-referencing their own previous work, eventhough this paper builds on top of it.

\section{Readability}
The paper was mostly well written with few typos and some punctual errors related to the use of contractions (i.e. don't, shouldn't, etc).

\section{Suggestions}
I think it was a poor choice to publish this paper as a conference paper rather than a journal paper. Most of my comments are related to missing information
that are essential to understand the work of the authors better. In addition, more experiments should be done to evaluate such work and compare it to other
blind source separation algorithms.

\end{document}